62 research outputs found
Improving the Latency and Throughput of ZooKeeper Atomic Broadcast
ZooKeeper is a crash-tolerant system that offers fundamental services to Internet-scale applications, thereby reducing the development and hosting of the latter. It consists of >3 servers that form a replicated state machine. Maintaining these replicas in a mutually consistent state requires executing an Atomic Broadcast Protocol, Zab, so that concurrent requests for state changes are serialised identically at all replicas before being acted upon.
Thus, ZooKeeper performance for update operations is determined by Zab performance. We contribute by presenting two easy-to-implement Zab variants, called ZabAC and ZabAA. They are designed to offer small atomic-broadcast latencies and to reduce the processing load on the primary node that plays a leading role in Zab. The former improves ZooKeeper performance and the latter enables ZooKeeper
to face more challenging load conditions
Design and development of algorithms for fault tolerant distributed systems
PhD ThesisThis thesis describes the design and development of algorithms for fault
tolerant distributed systems. The development of such algorithms requires
making assumptions about the types of component faults for which toler-
ance is to be provided. Such assumptions must be specified accurately. To
this end, this thesis develops a classification of faults in systems. This fault
classification identifies a range of fault types from the most restricted to the
least restricted. For each fault type, an algorithm for reaching distributed
agreement in the presence of a bounded number of faulty processors is
developed, and thus a family of agreement algorithms is presented. The
influence of the various fault types on the complexities of these algorithms
is discussed. Early stopping algorithms are also developed for selected fault
types and the influence of fault types on the early stopping conditions of the
respective algorithms is analysed. The problem of evaluating the perfor-
mance of distributed replicated systems which will require agreement algo-
rithms is considered next. As a first step in the direction of meeting this
challenging task, a pipeline triple modular redundant system is considered
and analytical methods are derived to evaluate the performance of such a
system. Finally, the accuracy of these methods is examined using computer
simulations.UK Science and Engineering Research Council (SERC),
DELTA-4 consortium of ESPIRI
DIGITALIZATION OF ENTERPRISE WITH ENSURING STABILITY AND RELIABILITY
The article is devoted to the development of an information system for automating business processes of a modern enterprise with ensuring stability and reliability, which are implemented by the applications developed by the authors. Goal is to develop improvements to the core digitalization processes of enterprises for sustainable functioning. The authors carried out a deep analysis and described the main stages of the enterprise digitalization process: the process of document approval, business processes of personnel management, etc. The architecture of the information system, a description of business processes and the principles of reliability and fault tolerance of the system being developed have been developed. The developed desktop-client application provides connection to the information system with the help of working computers of the enterprise through a local network with access to the application server. This allows you to reduce damage from accidental or deliberate incorrect actions of users and administrators; separation of protection; a variety of means of protection; simplicity and manageability of the information system and its security system
Know your customer:balancing innovation and regulation for financial inclusion
Financial inclusion depends on providing adjusted services for citizens with
disclosed vulnerabilities. At the same time, the financial industry needs to
adhere to a strict regulatory framework, which is often in conflict with the
desire for inclusive, adaptive, and privacy-preserving services. In this
article we study how this tension impacts the deployment of privacy-sensitive
technologies aimed at financial inclusion. We conduct a qualitative study with
banking experts to understand their perspectives on service development for
financial inclusion. We build and demonstrate a prototype solution based on
open source decentralized identifiers and verifiable credentials software and
report on feedback from the banking experts on this system. The technology is
promising thanks to its selective disclosure of vulnerabilities to the full
control of the individual. This supports GDPR requirements, but at the same
time, there is a clear tension between introducing these technologies and
fulfilling other regulatory requirements, particularly with respect to 'Know
Your Customer.' We consider the policy implications stemming from these
tensions and provide guidelines for the further design of related technologies.Comment: Published in the Journal Data & Polic
A Middleware Architecture for Intrusion Tolerant Service Replication
This paper presents a novel combination of known techniques for building a middleware which can support service replication in a hostile environment where a node can get corrupted and fail arbitrarily and message transfer delays cannot be accurately bounded. Using localised replication and output comparison, failarbitrary behaviour is reduced to fail-signal: the middleware process of a corrupted server site fails only by emitting a fail-signal, and eventually fails permanently. With this failure-mode, it is possible to avoid the FLP impossibility result which applies only for crash failures; specifically, the termination of a deterministic asynchronous order protocol can be guaranteed even if network delays fluctuate arbitrarily (due to network intrusions) for an indefinite period. We show how reduction to fail-signal is achieved and present a deterministic, message-ordering protocol. We then argue that several, well-known crash-tolerant order protocols can be re-used with little re-design within the proposed middleware
3 Enhancing Replica Management Services to Cope with Gro Failures
In a distributed system, replication of components, such as objects, is a well known way of achieving availability. For increased availability, crashed and disconnected components must be replaced by new components on available spare nodes. This replacement results in the membership of the replicated group 'walking ' over a number of machines during system operation. In this context, we address the problem of reconfiguring a group after the group as an entity has failed. Such a failure is termed a group failure which, for example, can be the crash of every component in the group or the group being partitioned into minority islands. The solution assumes crash-proof storage, and eventual recovery of crashed nodes and healing of partitions. It guarantees that (i) the number of groups reconfigured after a group failure is never more than one, and (ii) the reconfigured group contains a majority of the components which were members of the group just before the group failure occurred, so that the loss of state information due to a group failure is minimal. Though the protocol is subject to blocking, it remains efficient in terms of communication rounds and use of stable store, during both normal operations and reconfiguration after a group failure
- …